DRA: Derived Attributes#6081
Conversation
217ed13 to
7d68c53
Compare
7d68c53 to
df713b1
Compare
|
@thameem-abbas I think this might be useful for your use cases. |
|
cc @pohly |
|
/cc |
pohly
left a comment
There was a problem hiding this comment.
Looks okay to me.
Shadow API review: we need to sort out some details, see comments.
| - **Environment**: The CEL environment for `Expression` is exactly the same | ||
| as that for `CELDeviceSelector`, containing a single variable `device`. | ||
| - **Return Type**: The CEL expression must evaluate to a scalar (string, | ||
| integer, boolean) or a list of scalars (`[]string`, `[]int64`, `[]bool`). |
There was a problem hiding this comment.
This "must evaluate to" cannot be validated during admission time (depends on unknown input types).
How are evaluation errors at runtime handled?
There was a problem hiding this comment.
Ah yes I should have clarified this. Missed including it here but my thoughts were to have the same behaviour as CELDeviceSelector. Precisely this
I've update the KEP to that effect.
yongruilin
left a comment
There was a problem hiding this comment.
Declarative Validation plans to support validation for new API fields without handwritten validation counterpart directly.
But for complicated validation(e.g. CEL compilable), might still need to reply on handwritten.
/cc @jpbetz
|
/cc |
Good point. Yes for matchAttribute its simpler because there is no CEL expression involved and the kubernetes scheduler can be made intelligent to be able to 'match' an element with a single-element-list. I'm referring more to the existing use of pcieRoot attribute within CELExpressions for selection purposes. In particular, this limitation highlighted in the List Attributes KEP In light of this, I'm thinking it may serve our end users better if DRA Drivers don't try to migrate their single-element to list-type and instead introduce a different attribute for lists when needed. |
|
I like the idea of different APIs if that's just made clear now before more DRA drivers standardize and it becomes harder |
|
Two different attributes force users to know which one each driver publishes. The same attribute avoids that. |
But as things exist today in KEP-5491-dra-list-types-for-attributes, we don't yet have a solution which covers all backward compatibility issues and the migration of single scalar to a list type isn't transparent. So the cost of migrating an attribute from scalar to a list is:
These are all great discussions to have, but is it fair to say a solution to this problem is better discussed in the scope of KEP-5491 and is independent of this KEP? CC: @everpeace |
+1. There's very active work about adding the pcieRoot attribute support to the dra-driver-cpu which we aim to ship in 0.2.0 and we could use some guidance, but I don't want to derail the KEP review. Perhaps a topic for the WG-device-management meeting? |
|
/cc @ania-borowiec |
|
There are no concerns regarding this feature from the scheduler perspective. The time overhead is linear, so fully acceptable. Leaving the usability aspects on wg-device-management to decide. |
1 similar comment
|
There are no concerns regarding this feature from the scheduler perspective. The time overhead is linear, so fully acceptable. Leaving the usability aspects on wg-device-management to decide. |
|
/lgtm (sig network) networking community is using DRA as extension mechanism, but is unrealistic for networking to agree in all the possible standard attributes in such fragmented space. This allow us to provide a simple mechanism for iteration |
|
Thanks for the reviews @dom4ha and @aojea @johnbelamaric and @pohly -- Can one of you give the final approvals (including PRR)? Thanks |
pohly
left a comment
There was a problem hiding this comment.
Some nits, a suggestion for integration/performance testing, otherwise almost there as far as I am concerned.
| // +featureGate=DRADerivedAttributes | ||
| // +listType=map | ||
| // +listMapKey=name | ||
| // +k8s:optional |
There was a problem hiding this comment.
Nit: I know I suggested +k8s:optional as replacement for +optional earlier, but since then I learned that both are needed (the former for declarative validation, the latter for OpenAPI).
There was a problem hiding this comment.
No worries. Updated with +k8s:optional
| // - attributes (map[string]object): the device's attributes, grouped by prefix | ||
| // (e.g. device.attributes["dra.example.com"] evaluates to an object with all | ||
| // of the attributes which were prefixed by "dra.example.com"). | ||
| // - capacity (map[string]object): the device's capacities, grouped by prefix. |
There was a problem hiding this comment.
Nit: let's just say "which carries the same properties as in a CELDeviceSelector`?
For example, allowMultipleAllocations probably will be available (why should it not be?), but isn't described here.
There was a problem hiding this comment.
Sure, updated the statement to:
// The expression's input is an object named "device", which carries the
// same properties as in a CELDeviceSelector.
| - Verify correct CEL evaluation and constraint matching. | ||
|
|
||
| ##### Integration tests | ||
| - `test/integration/scheduler/dra`: |
There was a problem hiding this comment.
Instead of doing this in test/integration/scheduler/dra, do this in test/integration/scheduler_perf/dra/derived-attributes (new folder). Make sure to include a realistic scenario (number of attributes, complexity of the CEL expressions), then define a simple case for correctness checking and larger cases for performance measurement.
See consumable capacity (currently being updated in kubernetes/kubernetes#139511 because we didn't do that well initially).
With that done I don't see a need to add more testcases for this to test/integration/scheduler/dra.
There was a problem hiding this comment.
Thanks. Updated the directory and things to keep in mind during implementation.
| #### Beta | ||
| - Gather feedback from DRA driver maintainers (SIG Node / SIG Network). | ||
| - Any additional e2e tests implemented and running in Testgrid canaries. | ||
| - Verify scheduler performance and latency overhead with large device counts. |
There was a problem hiding this comment.
This is what you'll need the scheduler_perf cases for.
There was a problem hiding this comment.
Thanks, updated a reference to the scheduler_perf test discussed in the other comment.
johnbelamaric
left a comment
There was a problem hiding this comment.
A few minor questions/comments but PRR generally looks good. Please respond to these and then I can give approval
| - [x] Feature gate (also fill in values in `kep.yaml`) | ||
| - Feature gate name: DRADerivedAttributes | ||
| - Components depending on the feature gate: kube-apiserver, | ||
| kube-controller-manager, kube-scheduler, kubelet |
There was a problem hiding this comment.
My bad. You did ask me to clean this up from the PRR but I missed cleaning it up here. Removed now.
| ###### Are there any tests for feature enablement/disablement? | ||
|
|
||
| Yes. Unit tests in `pkg/apis/resource/validation` verify that | ||
| `derivedAttributes` and non-FQDN `matchAttribute` strings are rejected when the |
There was a problem hiding this comment.
Also cover distinctAttribute
There was a problem hiding this comment.
Explicitly worded distinctAttributes too.
| 2. When evaluating candidate devices for a request, the plugin executes the | ||
| cached CEL expressions against each `Device` object. The computed values are | ||
| stored in a temporary map associated with the candidate device. | ||
| 3. When evaluating `matchAttribute` constraints during the plugin |
There was a problem hiding this comment.
it should apply to anything in constraints, not just matchAttribute. Today that means also distinctAttribute.
Also: can derived attributes be used in device request CEL selectors? ie, are they evaluated BEFORE or AFTER the selectors?
There was a problem hiding this comment.
That's a good point. In CEL expressions, the derived attributes aren't really needed (the expression itself can embed the necessary mapping). But it would be more consistent and perhaps even expected if they were present. I'm +1 for that.
There was a problem hiding this comment.
Thanks @johnbelamaric and @pohly! The use of derivedAttributes within CELDeviceSelector is something I've thought about, but intentionally did not include in this KEP.
My main reason for keeping it out of scope—as you both noted—is that it isn't strictly necessary. If a user needs that logic in the selector, the CELDeviceSelector can just embed the necessary mapping directly. Attempting to include derivedAttributes into selectors brings additional complexities, such as how they impact performance by forcing early evaluation, and how they should be referenced in the CEL environment (i.e. whether we add them to device.attributes or have to introduce a new device.derivedAttributes map)
Perhaps earlier in the KEP cycle we could have had these discussions in detail, but seeing as we're close to the KEP deadline, I'd rather avoid introducing this change without more careful consideration. If this usability experience is needed in the future, a different KEP would be a perfect opportunity to explore that avenue. Hope you both share the same thoughts :)
With that in mind, I've added this bit of clarification:
- **Evaluation Order**: Derived attributes are evaluated **after** the device
request's `CELDeviceSelector` has filtered candidate devices. They are not
injected into the selector's CEL environment and are exclusively used for
evaluating constraints like `matchAttribute` and `distinctAttribute`.
and also added a separate section ## Future Considerations > ### Derived Attributes in Device Selectors on the current rationale and future improvement through a separate enhancement (if needed)
4a206eb to
73279ea
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dom4ha, gauravkghildiyal, johnbelamaric The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |

derivedAttributes) toDeviceRequestto synthesize virtual grouping keys on the fly./sig scheduling
/wg device-management